-
Notifications
You must be signed in to change notification settings - Fork 356
Mark HTTP/1.1 async connection as not open (non-reusable) as soon as it becomes closed by the opposite endpoint #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -277,13 +278,14 @@ public final void onInput(final ByteBuffer src) throws HttpException, IOExceptio | |||
return; | |||
} | |||
|
|||
boolean endOfStream = false; | |||
if (incomingMessage == null) { | |||
final int bytesRead = inbuf.fill(ioSession); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a key issue right here: we can't read bytes _and _ endOfStream
in the same operation. So even if the closure of the connection is already known to the host (at some level of abstraction), this code doesn't learn about it until the next event loop iteration, by which point the connection has already been returned to the pool and potentially reused.
I guess one question here is: should we read in a loop until bytesRead
comes back as either 0 or -1? Kind of an edge-triggered approach, where you drain the file descriptor until read()
returns would-block (EAGAIN
). If possible, I think that approach, combined with this change set, would greatly reduce or eliminate purely internal race conditions in the client's connection management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one question here is: should we read in a loop until
bytesRead
comes back as either 0 or -1?
@rschmitt The protocol handler should be reading incoming data greedily until there is no more data to be read. I will review the code tomorrow to make sure it is really so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschmitt You are right. The protocol handler reads message body content greedily but stops at the end of the message and waits until the next input event. I am looking into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschmitt Please test the updated change-set in your test environment and let me know if it improves the situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran my tests against your change set. Here are my findings:
- Uncontended connection reuse is moderately more reliable with HTTP, but not with HTTPS.
- Contended connection reuse is the same, with failure rates of 67% for small batches and 50% for large batches.
I've pushed my reproducer here so you can experiment with it. One thing I quickly discovered is that your change set is ineffective on HTTPS because SSLIOSession::read
is just returning the same cached value every time:
@Override
public int read(final ByteBuffer dst) {
return endOfStream ? -1 : 0;
}
This is altogether different from HTTP, where I've confirmed that the duplexer reads endOfStream
on the second call to fillBuffer()
.
To me, the biggest mystery is why contended connection reuse is failing at such a consistently high rate:
Core version: 5.4-alpha1-SNAPSHOT
Client version: 5.6-alpha1-SNAPSHOT
Http: Sequential requests (rapid): 2,496 succeeded; 4 failed (99.84% success rate)
Http: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
Http: Single large batch: 15 succeeded; 15 failed (50.00% success rate)
Http: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate)
Https: Sequential requests (rapid): 2,235 succeeded; 265 failed (89.40% success rate)
Https: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
Https: Single large batch: 15 succeeded; 15 failed (50.00% success rate)
Https: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate)
One fascinating detail is that your change set caused the HTTP batch tests to see RequestNotExecutedException
, where previously they saw ConnectionClosedException
. Before:
Http: Sequential requests (rapid): 2,479 succeeded; 21 failed (99.16% success rate)
14 not executed, 6 closed, 0 reset, 1 other
Http: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
0 not executed, 0 closed, 0 reset, 0 other
Http: Single large batch: 15 succeeded; 15 failed (50.00% success rate)
0 not executed, 15 closed, 0 reset, 0 other
Http: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate)
0 not executed, 5 closed, 0 reset, 0 other
After:
Http: Sequential requests (rapid): 2,491 succeeded; 9 failed (99.64% success rate)
9 not executed, 0 closed, 0 reset, 0 other
Http: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
0 not executed, 0 closed, 0 reset, 0 other
Http: Single large batch: 15 succeeded; 15 failed (50.00% success rate)
15 not executed, 0 closed, 0 reset, 0 other
Http: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate)
5 not executed, 0 closed, 0 reset, 0 other
This must mean that the connection is being returned to the pool the very instant the HTTP response is read, before the second fillBuffer()
call takes place (or it means that the new endOfStream
check is ineffective).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must mean that the connection is being returned to the pool the very instant the HTTP response is read
@rschmitt This is precisely what the protocol handler does. As soon as the response message has been fully consumed and the connection is deemed re-usable the protocol handler hands it back to the connection pool which may immediately lease it out to another caller. No amount of trickery can make the protocol handler aware of the opposite endpoint's intent to not honor the protocol requirements with regards of the connection persistence, especially when using TLS to negotiate the connection closure. It can get lucky and read the end of stream over a plain connection but this is just luck.
There is no solution to this problem other then a request recovery and re-trial mechanism and possibly some sort of a stale connection check. I suggest we focus on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes in this PR significantly improved things by causing RequestNotExecutedException
(a safely retryable exception) to be thrown much more consistently in this case. I was hoping we could figure out how to do the same thing in the case of TLS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping we could figure out how to do the same thing in the case of TLS.
@rschmitt I do not think this can be done. Not without a major rewrite of the TLS layer. Presently the TLS layer and the HTTP protocol layers are decoupled and the protocol layer can interact with the lower transport layer through a generic IOSession
interface. I do not see a way of making it reliably run through the TLS close notify handshake from the protocol handler.
I really think we ought to focus on #547 or something similar (such as using OPTIONS as a poor man's version of H2 ping)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fair enough. In that case, I'm indifferent on whether to ship the present PR. Plaintext HTTP is almost irrelevant today.
I do have some unfinished business involving a bug in UDS connection closure with CloseMode.GRACEFUL
, and I think it might be related to the things we're talking about in this thread. I'll try to get back to that this week.
df89f06
to
24ccaad
Compare
…it becomes closed by the opposite endpoint
24ccaad
to
dfa2cd5
Compare
@rschmitt This should reduce the window of the stale connection lease race condition somewhat.